AITopics | defense method

Collaborating Authors

defense method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

03df5246cc78af497940338dd3eacbaa-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 01:32:05 GMT

artificial intelligence, machine learning, perturbation, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
Asia (0.68)
North America > Canada > British Columbia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

the Fine tuning Process of on Poisoned

Neural Information Processing SystemsApr-30-2026, 03:38:50 GMT

In this section, we show our empirical observations obtained from fine-tuning PLMs on poisoned494 datasets. Specifically, we demonstrate that the backdoor triggers are easier to learn from the lower495 layers than the features corresponding to the main task. This observation plays a pivotal role in496 designing and understanding our defense algorithm. In our experiment, we focus on the SST-2497 dataset [30] and consider the widely adopted word-level backdoor trigger and the more stealthy498 style-level trigger. For the word-level trigger, we follow the approach in prior work [25] and adopt the499 meaningless word "bb" as the trigger to minimize its impact on the original text's semantic meaning.500

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

0799492e7be38b66d10ead5e8809616d-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:10:11 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.68)
Europe (0.67)
North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Defending against Data-Free Model Extraction by Distributionally Robust Defensive Training

Neural Information Processing SystemsApr-24-2026, 05:29:47 GMT

Data-Free Model Extraction (DFME) aims to clone a black-box model without knowing its original training data distribution, making it much easier for attackers to steal commercial models. Defense against DFME faces several challenges: (i) effectiveness; (ii) efficiency; (iii) no prior on the attacker's query data distribution and strategy. However, existing defense methods: (1) are highly computation and memory inefficient; or (2) need strong assumptions about attack data distribution; or (3) can only delay the attack or prove a model theft after the model stealing has happened. In this work, we propose a Memory and Computation efficient defense approach, named MeCo, to prevent DFME from happening while maintaining the model utility simultaneously by distributionally robust defensive training on the target victim model. Specifically, we randomize the input so that it: (1) causes a mismatch of the knowledge distillation loss for attackers; (2) disturbs the zerothorder gradient estimation; (3) changes the label prediction for the attack query data. Therefore, the attacker can only extract misleading information from the black-box model. Extensive experiments on defending against both decision-based and scorebased DFME demonstrate that MeCo can significantly reduce the effectiveness of existing DFME methods and substantially improve running efficiency.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.28)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Constructing Unrestricted Adversarial Examples with Generative Models

Neural Information Processing SystemsMar-16-2026, 23:02:51 GMT

Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose a new class of adversarial examples that are synthesized entirely from scratch using a conditional generative model, without being restricted to norm-bounded perturbations. We first train an Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the class-conditional distribution over data samples. Then, conditioned on a desired class, we search over the AC-GAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that these new kind of adversarial images, which we call Generative Adversarial Examples, are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that generative adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.64)

Add feedback

SampDetox: Black-box Backdoor Defense via Perturbation-based Sample Detoxification Y anxin Y ang

Neural Information Processing SystemsFeb-18-2026, 09:07:14 GMT

MLaaS are limited by their reliance on training samples or white-box model analysis, highlighting the need for a black-box backdoor purification method.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation (0.70)
Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(3 more...)

Add feedback

e7938ede51225b490bb69f7b361a9259-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 17:25:33 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots Ruixiang T ang

Neural Information Processing SystemsFeb-17-2026, 17:25:28 GMT

In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > Nepal (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Neural Information Processing SystemsFeb-16-2026, 17:10:47 GMT

In addition, we introduce a reversible mapping to determine the defensive target label.

artificial intelligence, backdoor attack, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
(2 more...)

Add feedback

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing SystemsFeb-16-2026, 13:52:43 GMT

However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods.

data mining, machine learning, purified model, (20 more...)

Neural Information Processing Systems

Country: